Fine-grained semantic segmentation of a person's face and head, including facial parts and head components, has progressed a great deal in recent years. However, it remains a challenging task, whereby considering ambiguous occlusions and large pose variations are particularly difficult. To overcome these difficulties, we propose a novel framework termed Mask-FPAN. It uses a de-occlusion module that learns to parse occluded faces in a semi-supervised way. In particular, face landmark localization, face occlusionstimations, and detected head poses are taken into account. A 3D morphable face model combined with the UV GAN improves the robustness of 2D face parsing. In addition, we introduce two new datasets named FaceOccMask-HQ and CelebAMaskOcc-HQ for face paring work. The proposed Mask-FPAN framework addresses the face parsing problem in the wild and shows significant performance improvements with MIOU from 0.7353 to 0.9013 compared to the state-of-the-art on challenging face datasets.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views while preserving specific details of the input image. High-fidelity 3D GAN inversion is inherently challenging due to the geometry-texture trade-off in 3D inversion, where overfitting to a single view input image often damages the estimated geometry during the latent optimization. To solve this challenge, we propose a novel pipeline that builds on the pseudo-multi-view estimation with visibility analysis. We keep the original textures for the visible parts and utilize generative priors for the occluded parts. Extensive experiments show that our approach achieves advantageous reconstruction and novel view synthesis quality over state-of-the-art methods, even for images with out-of-distribution textures. The proposed pipeline also enables image attribute editing with the inverted latent code and 3D-aware texture modification. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
translated by 谷歌翻译
图像引导放射疗法中的CBCT为患者的设置和计划评估提供了关键的解剖学信息。纵向CBCT图像登记可以量化分裂间的解剖变化。这项研究的目的是提出一个无监督的基于深度学习的CBCT-CBCT变形图像登记。提出的可变形注册工作流程包括训练和推理阶段,这些培训和推理阶段通过基于空间转换的网络(STN)共享相同的进率前路。 STN由全球生成对抗网络(Globalgan)和本地GAN(Localgan)组成,分别预测了粗略和细尺度运动。通过最小化图像相似性损失和可变形矢量场(DVF)正则化损失,而无需监督地面真实DVF的训练,对网络进行了训练。在推理阶段,训练有素的Localgan预测了局部DVF的斑块,并融合形成全图像DVF。随后将局部全图像DVF与Globalgan生成的DVF合并以获得最终的DVF。在实验中,使用来自20名腹部癌症患者的100个分数CBCT评估了该方法,并在保持测试中来自21名不同腹部癌症患者的队列中的105个分数CBCT。从定性上讲,注册结果显示了变形的CBCT图像与目标CBCT图像之间的对齐。定量地,在基准标记和手动确定的地标计算的平均目标注册误差(TRE)为1.91+-1.11 mm。变形CBCT和目标CBCT之间的平均平均绝对误差(MAE),归一化的跨相关性(NCC)分别为33.42+-7.48 HU,0.94+-0.04。这种有希望的注册方法可以提供快速准确的纵向CBCT对准,以促进分流的解剖变化分析和预测。
translated by 谷歌翻译
在本文中,我们解决了单眼散景合成的问题,我们试图从单个全焦点图像中呈现浅深度图像。与DSLR摄像机不同,由于移动光圈的物理限制,这种效果无法直接在移动摄像机中捕获。因此,我们提出了一种基于网络的方法,该方法能够从单个图像输入中渲染现实的单眼散景。为此,我们根据预测的单眼深度图引入了三个新的边缘感知散景损失,该图在模糊背景时锐化了前景边缘。然后,使用对抗性损失对该模型进行固定,从而产生逼真的玻璃效果。实验结果表明,我们的方法能够在处理复杂场景的同时产生令人愉悦的自然散景效果,并具有锋利的边缘。
translated by 谷歌翻译
由于简单但有效的训练机制和出色的图像产生质量,生成的对抗网络(GAN)引起了极大的关注。具有生成照片现实的高分辨率(例如$ 1024 \ times1024 $)的能力,最近的GAN模型已大大缩小了生成的图像与真实图像之间的差距。因此,许多最近的作品表明,通过利用良好的潜在空间和博学的gan先验来利用预先训练的GAN模型的新兴兴趣。在本文中,我们简要回顾了从三个方面利用预先培训的大规模GAN模型的最新进展,即1)大规模生成对抗网络的培训,2)探索和理解预训练的GAN模型,以及预先培训的GAN模型,以及3)利用这些模型进行后续任务,例如图像恢复和编辑。有关相关方法和存储库的更多信息,请访问https://github.com/csmliu/pretretaining-gans。
translated by 谷歌翻译
医疗超声(US)是临床实践中使用最广泛的成像方式之一。但是,它的使用带来了独特的挑战,例如可变成像质量。深度学习(DL)可以用作高级医学图像分析工具,而DL模型的性能受到大数据集缺乏的极大限制。在这里,我们通过组合卷积神经网络(CNN)和生成对抗网络(GAN)来开发半监督分类增强(SSCE)结构,以解决数据短缺。具有780张图像的乳腺癌数据集用作我们的基本数据集。结果表明,与单独使用CNN模型相比,我们的SSCE结构的准确性最高为97.9%,最大提高了21.6%,并且使用同一数据集的表现高达23.9%。我们认为,我们提出的最先进的方法可以被视为诊断美国医学图像的潜在辅助工具。
translated by 谷歌翻译
利用通用神经结构来替代手动设计或感应偏见,最近引起了广泛的兴趣。但是,现有的跟踪方法依赖于定制的子模块,需要进行架构选择的先验知识,从而阻碍了更通用系统中的跟踪开发。本文通过利用变压器主链进行关节特征提取和交互来提供简化的跟踪体系结构(SIMTRACK)。与现有的暹罗跟踪器不同,我们将输入图像序列化,并在单支骨架上直接串联。主链中的特征相互作用有助于删除精心设计的交互模块并产生更有效的框架。为了减少视觉变压器中的减速采样的信息丢失,我们进一步提出了动脉窗口策略,以可接受的计算成本提供更多多样化的输入补丁。我们的SimTrack在Lasot/TNL2K上以2.5%/2.6%的AUC增益提高了基线,并获得了与其他没有铃铛和哨声的其他专业跟踪算法竞争的结果。
translated by 谷歌翻译
创建什么故事需要推理关于先前陈述以及变更条件的可能结果。人们可以在新条件下轻松生成连贯的结局,但目前系统会对原始故事进行最小的变化来挑战。因此,一个主要挑战是生成逻辑故事和用最小编辑重写之间的权衡。在本文中,我们提出了教育,这是一种基于编辑的无预测方法,用于反复重写。教育包括基于估计在线条件的因果效果的目标位置检测策略,这使故事的因果不变部分。 Bowat然后在流畅,一致性和最小编辑约束下生成故事。我们还提出了一种新的指标来缓解当前自动指标的缺点,更好地评估权衡。我们评估公共反事故事重写基准的教育。实验表明,教育根据自动和人类评估,达到了无监督的SOTA方法的最佳权衡。教育资源可用于:https://github.com/jiangjiechen/educat。
translated by 谷歌翻译
诊断阿尔茨海默病(AD)的早期阶段(AD)对于及时治疗至关重要以缓慢进一步恶化。可视化广告早期阶段的形态特征是巨大的临床价值。在这项工作中,提出了一种新的多向感知生成的对抗网络(MP-GaN)来可视化表明不同阶段患者的广告严重程度的形态特征。具体地,通过将​​新的多向映射机制引入模型中,所提出的MP-GaN可以有效地捕获突出全局特征。因此,通过利用来自发电机的类别辨别图,所提出的模型可以通过源域和预定义目标域之间的MR图像变换清楚地描绘微妙的病变。此外,通过集成对抗性损失,分类损失,周期一致性损失和\ emph {l} 1惩罚,MP-GaN中的单个发电机可以学习多类的类鉴别映射。对阿尔茨海默病神经影像倡议(ADNI)数据集进行了广泛的实验结果表明,与现有方法相比,MP-GAN实现了卓越的性能。由MP-GaN可视化的病变也与临床医人观察到的一致。
translated by 谷歌翻译